Chinese Chunking Based on Maximum Entropy Markov Models

نویسندگان

  • Guang-Lu Sun
  • Changning Huang
  • Xiaolong Wang
  • Zhi-Ming Xu
چکیده

This paper presents a new Chinese chunking method based on maximum entropy Markov models. We firstly present two types of Chinese chunking specifications and data sets, based on which the chunking models are applied. Then we describe the hidden Markov chunking model and maximum entropy chunking model. Based on our analysis of the two models, we propose a maximum entropy Markov chunking model that combines the transition probabilities and conditional probabilities of states. Experimental results for two types of data sets show that this approach achieves impressive accuracy in terms of the F-score: 91.02% and 92.68%, respectively. Compared with the hidden Markov chunking model and maximum entropy chunking model, based on the same data set, the new chunking model achieves better performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Chunking Strategy Towards Unknown Word Detection in Chinese Word Segmentation

This paper proposes a chunking strategy to detect unknown words in Chinese word segmentation. First, a raw sentence is pre-segmented into a sequence of word atoms 1 using a maximum matching algorithm. Then a chunking model is applied to detect unknown words by chunking one or more word atoms together according to the word formation patterns of the word atoms. In this paper, a discriminative Mar...

متن کامل

Selecting Optimal Feature Template Subset for CRFs

Conditional Random Fields (CRFs) are the state-of-the-art models for sequential labeling problems. A critical step is to select optimal feature template subset before employing CRFs, which is a tedious task. To improve the efficienc y of t his step, we propose a new method that adopts the maximum entropy (ME) model and maximum entropy Markov models (MEMMs) instead of CRFs considering the homolo...

متن کامل

Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms

We describe new algorithms for training tagging models, as an alternative to maximum-entropy models or conditional random fields (CRFs). The algorithms rely on Viterbi decoding of training examples, combined with simple additive updates. We describe theory justifying the algorithms through a modification of the proof of convergence of the perceptron algorithm for classification problems. We giv...

متن کامل

Complete Syntactic Analysis Bases on Multi-level Chunking

This paper describes a complete syntactic analysis system based on multi-level chunking. On the basis of the correct sequences of Chinese words provided by CLP2010, the system firstly has a Part-ofspeech (POS) tagging with Conditional Random Fields (CRFs), and then does the base chunking and complex chunking with Maximum Entropy (ME), and finally generates a complete syntactic analysis tree. Th...

متن کامل

Taylor Expansion for the Entropy Rate of Hidden Markov Chains

We study the entropy rate of a hidden Markov process, defined by observing the output of a symmetric channel whose input is a first order Markov process. Although this definition is very simple, obtaining the exact amount of entropy rate in calculation is an open problem. We introduce some probability matrices based on Markov chain's and channel's parameters. Then, we try to obtain an estimate ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJCLCLP

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2006